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Abstract 



This paper introduces a new Importance Sampling scheme, called Adaptive 
Twisted Importance Sampling, which is adequate for the improved estimation 
of rare event probabilities in he range of moderate deviations pertaining to the 
empirical mean of real i.i.d. summands. It is based on a sharp approximation 
of the density of long runs extracted from a random walk conditioned on its end 
value. 
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0.1 Introduction and notation 

Importance Sampling procedures aim at reducing the calculation time which is 
necessary in order to evaluate integrals, often in large dimension. We consider 
the case when the integral to be numerically computed is the probability of an 
event defined by a large number of random components; this case has received 
quite a lot of attention, above all when the event is of small probability, typically 
of order 10 -8 or so, as occurs frequently in industrial applications or in commu- 
nication devices. The order of magnitude of the probability to be estimated is 
here somehow larger, and aims at coping with "moderate probabilities" as dealt 
with in statistics. The basic situation in IS can be stated as follows. 
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Let Z be some random variable, say on R, with probability measure P and 
density p. Let A be a subset of R with P(A) > 0. Let Z{ := (Zi, Z L ) denote 
a sample of i.i.d. observations of Z . By the law of large numbers 

PL:=\j£,U{Zi) (1) 

1=1 

estimates P{A) without bias, when the Z-s are sampled under the density p. 
An altenative unbiased estimate of P(A) can be defined through 

for all density g when the support of p is a subset of the support of <?, and 
the F/s.are i.i.d. observations of a r.v. Y with density g. As is well known the 
optimal choice for the IS sampling density g is Pz/a > the density of Z conditioned 
upon the event (Z E A), unfortunately an unpracticable choice which presumes 
the knowledge of P(A), the quantity to be estimated. Would this sampling 
density be at hand, the required number L of replications of Y to be performed 
would reduce to 1 and the estimate would be exactly P(A). This fact motivates 
efforts in order to approximate pz/A in the case when the variable Z has a 
distribution which allows it. Sometimes the random variable Z is obtained as 
a function of a large number of random variables, say X™ := (Xi, ...,X„) and 
the event (Z <E A) is of small or moderate probability. Also the density of Z 
cannot be evaluated analytically, due to the very definition of Z, but the random 
variables Xj 's have known distribution. This happens for instance when Z is 
a moment estimator or when it is the linear part of the expansion of an M or L 
-estimate (see Section 4). The example which we have in mind is the following, 
which helps as a benchmark case in the IS literature. 

The r.v's X^s are i.i.d. , are centered with variance 1, with common density 
px on R, and 

Z:=-VX, =:ls? 

i=l 

is the empirical mean of the X^s. The set A is 

A:=(a n ,oo) (3) 
where a n tends slowly to -E(Xi) from above and we intend to estimate 

P n :=p(i S? €^) 

for large but fixed n. Many asymptotic results provide sharp estimates for 
P(Z G A) but it is a known fact that asymptotic expansions are not always 
good tools when dealing with numerical approximations for fixed (even large) 
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n. For example, citing Ermakov (2004, p 624, [7]), the Berry-Esseen approxima- 
tion for the evaluation of risks of order 10~ 2 in testing is pertinent for sample 
sizes of order 5000-10000; also the accuracy of available moderate deviation 
probabilities as developped by Inglot, Kallenberg and Ledwina in [TT] has not 
been investigated. This motivates our interest in numerical techniques in this 
field. 

According to ([I]) the basic estimate of P(Z £ A) is defined as follows: gen- 
erate L i.i.d. samples with underlying density px and define 

(=1 

where 

£ n := {{xi, ...,x n ) G R n : «J/n > a n } . (4) 

Here s™ := xi+...+x„ . The statistics P^ n \8. n ) estimates the moderate deviation 
probability of the sample mean of the X^s. Also denoting g a sampling density 
of the vector Y™ the associated IS estimate is 

p ^ £ )--=kf^f§f§-^{yrii)). (5) 

In the range of moderate deviations the two major contributions to IS 
schemes for the estimation of P n are Fuh and Hu [TU] and Ermakov [S]. The 
paper by Fuh and Hu does not consider events of moderate deviations as in- 
tended here; it focuses on IS schemes for the estimation of P(Z £ A) where Z 
is a given multinormal random vector and A is a fixed set in M. d . The authors 
consider efficiency with respect to the variance of the estimate and state that 
for the case of interest the efficient sampling scheme is deduced from the dis- 
tribution of Z by a shift in the mean inside the set A. The papers by Ermakov 
instead handle similar problems as we do. Ermakov's 2007 paper [8] considers 
a sampling scheme where g is the density of i.i.d. components. He proves that 
this scheme is efficient in the sense that the computational burden necessary 
to obtain a relative precision of the estimate with respect to P n does not grow 
exponentially as a function of n. He considers statistics of greater generality 
than the sample mean, such as M and L estimators; in the range of moderate 
deviations the asymptotic behavior of those objects is captured however through 
their linear part which is the empirical mean of their influence function, which 
puts the basic situation back at the center of the scene. We discuss efficiency 
in Section 3 and present some results in connection with Ermakov's pertaining 
to M and L estimators in Section 4. 

The numerator in the expression is the product of the pxi(Y)'s while 
the denominator need not be a density of i.i.d. copies evaluated on the Y/s. 
Indeed the optimal choice for g is the density of X" conditioned upon £ n , say 
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Since the optimal solution is known to be px™ /s„ > the best its approximation, 
the best the sampling scheme, at least when it does not impose a large calcu- 
lation burden; classical sampling schemes consist in simulation of independent 
copies of r.v.'s Y,i{l) ,1 < i < n, and efficiency is defined in terms of variance of 
the estimate inside this class of sampling, which, by nature, is suboptimal with 
respect to sampling under good approximations of p x t j £n for long runs, i.e. for 
large k = fc„.The present paper explores the choice of good sampling schemes 
from this standpoint. Obviously mimicking the optimal scheme results in a net 
gain on the number L of replications of the runs which are necessary to obtain a 
given accuracy of the estimate with respect to P n . However the criterion which 
we consider is different from the variance, and results as an evaluation of the 
MSE of our estimate on specific subsets of the runs generated by the sampling 
scheme, which we call typical subsets, namely having probability going to 1 un- 
der the sampling scheme as n increases. On such sets, the MSE is proved to be 
of very small order with respect to the variance of the classical estimate, whose 
MSE cannot be diminuished on any such typical subsets. We believe that this 
definition makes sense and prove it also numerically. This is the scope of Section 
3 in which it will be shown that the relative gain in terms of simulation runs 
necessary to perform an a% relative error on P n drops by a factor y/n — kj \fri 
with respect to the classical IS scheme. 

Our proposal therefore hinges on the local approximation of the conditional 
distribution of longs runs from X". This cannot be achieved through the 
classical theory of moderate deviations, first developped by De Acosta and more 
recently by Ermakov; at the contrary the ad hoc procedure developped in the 
range of large deviations by Diaconis and Freedman [6] for the local approxima- 
tion of the conditional distribution of Xf given the value of S™ is the starting 
point of the present approach. We find it useful to briefly expose these two 
different points of view. We also mention the approximation technique for mod- 
erate deviations of sub linear functionals of the empirical measure by Inglot, 
Kallenberg and Ledwina [11], based on strong approximation techniques; these 
results provide explicit equivalents for the probability of moderate deviations, 
but do not lead to adequate approximations for the obtention of their numerical 
counterparts by IS methods. 

The following notation and assumptions will be kept throughout this paper. 

We assume that Xi satisfies the Cramer condition, i.e. Xi has a finite 
moment generating function $(£) := SexptXi in a non void neighborhood of 
0; denote 

m(t) := ^ log <!>(*) (6) 



and 



dt 

s 2 {t) := jm{t) (7) 

when defined. The values of m(t a ) := £\og$(t a ) and s 2 {t a ) := f t m{t a ) are 
the expectation and the variance of the tilted density 

^ = (8) 
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where t a is the only solution of the equation m(t) — a when a belongs to the 
support of p. Denote 11" the probability measure with density 7r Q . The Chernoff 
function of Xi is 

I(x) := suptx — log$(t) 
t 

for x in the support of Xi and it holds 

—I(x) = m*~(x) 



dx 2 s 2 o m*~(z) 

where m*~(x) denotes the reciprocal function of m. 
Denote 

/+oo 
e lsx p x (x)dx 
-oo 

the characteristic function of Xi. Assume that 

/+oo 
\tp(s)\ u ds < oo (9) 
-oo 

for some v > 1. This condition entails the validity of the Edgeworth expansions 
to be used in the sequel (see e.g. Feller [5]). 

The notation p(X = x) is used to denote the value of the density p of the r.v. 
X at point x. The notation p(S™ = s) is used to define the value of the density 
of the r.v. S™ under p, i.e. when the summands are i.i.d. with density p. Also 
we may write p (/ (X™) = u) to denote the density (on the corresponding image 
space) of some function / of the sample X™. We write ty n the distribution of 
X™ given £ n and p„ its density. The symbol n denotes the standard normal 
density on R. 



0.1.1 From moderate deviations to conditional distribu- 
tions 

A basic requirement for a good IS sampling scheme is that it mimicks the con- 
ditional density Px™/e n - We first expose a general argument in this direction in 
order to clarify that there is no bypass through the general theory of large or 
moderate deviations to achieve this goal. Also the present discussion motivates 
the choices of classical IS sampling schemes (Ermakov), emphasizing that the 
general theory provides the proof that the marginal conditional distribution of 
X™ under £ n is well approximated by IP" a statement which is usually refered 
to as a Gibbs conditional principle. We need some tools from the moderate 
deviation principle as developped by (7] following [5]- 

Let F be a class of measurable functions defined on R and Mp be the class 
of all signed finite measures on M which satisfy 

J \f\d\Q\ < oo for all / in F. 
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On Mp define the Tp topology, which is the coarsest for which all mappings 
/ — ► J fdQ (Q G Mp) are continuous for all / in F. For P a probability 
measure and Q in M(R) the so-called Chi-square distance between P and Q is 
defined through 




whenever Q is absolutely continuous with respect to P, and equals +00 other- 
wise. 

The following moderate deviation Sanov result holds; see [8]. Assume that 
a n tends to and a n y/n tends to infinity. 

Let P„ := — Ym=i denote the empirical measure pertaining to an i.i.d. 
sample Xi,X 2) ...,X n . Write M„ := ^- (P„ - P) . It holds 



- inf x 2 (Q.P) < liminf logPr (M„ e B) (10) 

Qeint(B) n na„ 

< limsup At logPr (M„ € B) < - inf x 2 (Q,P) 
n na A n Qed(B) 

where the interior and closure of the set B refer to the Tp topology on Mp. 

Consider now the asymptotic distribution of Xi conditionally upon the 
sequence of events (S"/n > a n x), so-called moderate deviation events. With 
F := B(R) U (v — > v) and B(M) the class of all bounded measurable functions, 
(flU)) holds with B substitued by fl x the subset of Mp defined through 

a x := j<2 : J tdQ(t) > x and J dQ(t) = o| . 



With P the probability measure of the r.v. Xi denote Q* the x 2 projection 
of P on £l x , namely 

Q* :=arginf{ X 2 (g,P),Q€ Q x } . 

The set J7 X is closed in Mp (R) equipped with the Tp topology. Existence of 
a x 2 projection of P on a Tp— closed subset of M(R) holds as a consequence 
of Theorem 2.6 in [3] when J |/| <iP is finite for all / in F, which clearly holds 
since E |Xi| is finite. Uniqueness follows from the convexity of fl x and the strict 
convexity of Q — * x 2 (Q, P). From (flQ)) it can easily be obtained that 

P (X a e A/5„,,) = P(A) + a„xQ*(A) + o (on) (11) 

with £ ntX := (S™ > na n x) which in turn yields the following 

Proposition 1 With the above notation 

P (Xx E A/S n , x ) = f TT a - x (y)dy + o(l) (12) 

J A 



G 



The proofs of ifTTj) and of the above Proposition are differed to the appendix. 
This way cannot provide an equivalent expression for the conditional density of 
Xi which requires strong regularity assumptions. Furthermore it cannot be 
extended to the case of interest here, when Xi is substituted by X^ for large 
values of k = k n , i.e. when an approximation of the law of the path X™ is 
needed, at least on long runs. 

However the result in Proposition [1] is a strong argument in favor of Er- 
makov's sampling scheme, namely simulating i.i.d. r.v.'s with common density 
7r a " in ©. 

0.1.2 Density of a partial path conditioned on the exact 
value of the sum 

The other way follows Zabell pi)] and Diaconis and Freedman [5] approaches, 
which were developped in the range of large deviations. See also van Camper- 
hout and Cover [T5], who considered the density or the c.d.f. of X^ conditioned 
on the value of S™ for fixed k. It is restricted in essence to the context of the 
sample mean. The sketch of the method is as follows. 
The density of Xi given S" = ns writes 

Psz(ns- Xl ) 
1 Ps?{ns) 

where we used the symbol p to emphasize that the XJs are i.i.d. with common 
density px x - It is a known fact, and easy to establish, that the density defined 
in (|13jl is invariant when sampling from any density of the form ([8]) instead of 
PXi- This yields, selecting a = s 

When the r.v's X^'s obey a local central limit theorem under the sampling 
density -k s Xi it can be proved that 

p Xl /S?=n s (zi) = tt Xi + o(l)) (14) 

as n tends to oo. Diaconis and Freedman obtain such a statement when Xi is 
substituted by Xj with fc/ra — ► 0, < < 1. We will continue this approach 
in the range of moderate deviations, enhancing it to the density of X^ with 
k/n — > 1. Integrating with respect to the conditional distribution of S™ under 
the event £ n provides the required approximation. 

The scope of the present paper is to present some technique which provides 
typical realisations of runs Xf under the conditional event £ n for very large k. 
Therefore it aims at the exploration of the support of the distribution of X™ 



i (ns — x\) 
t| ? (ns) 



*x, (si)- 
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under £ n . The application which is presented pertains to Importance Sampling 
for the estimation of rare events probabilities through the Adaptive Twisted IS 
scheme. 

Section 2 of this paper is devoted to the approximation of the conditional 
density of under £ n . Section 3 presents the ATIS algorithm , a number of 
remarks for its practical implementation, and discusses efficiency . Section 4 is 
devoted to M and L estimates and their moderate deviation probabilities. We 
have postponed many proofs to the Appendix, but the main one of Section 2. 



0.2 Conditioned random walks 
0.2.1 Three basic Lemmas 

Moderate deviations results for sums of i.i.d. real valued random variables 
under our assumptions have been studied since the 50's by many authors. We 
will make use of a local result, due to Richter [13] , which we state as 

Lemma 2 Under the general hypotheses and notation of this paper, when a n is 
a sequence satisfying lim n — >oo a n = together with y/na n — » oo it holds 

m ( \ yHcxp-n/(a») 

P[ — =«n = 7= (1 + 0(a n )) . 

The global counterpart of Lemma [2] in the form used here is due to Jensen 
(see [12], corollary 6.4.1) and states 

Lemma 3 Under the same hypotheses as above 

/S 1 / \ cxp -nl(a n ) 

" — > a-n] — i — 

\n J V27rVm/Han) 

where ip(a n ) := t an s(t an ). 

The following known fact is used repetedly. It sets that the conditional 
densities of sub-partial sums given the partial sum is invariant through any 
tilting. Assume Xi, ...,X„ i.i.d. with density p and note 7r a the corresponding 
tilted density for some parameter a. 

Lemma 4 For 1 < i < j < n, for all a in the support of P, for all u and s 

v (s; = «/s; = s) = *• (s> = u /s; = s) . 



1 + 0( ^» 
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0.2.2 Typical paths conditioned on their sum 

The sequence of constants a n defining A in and (H|) satisfies 



(A) 



lim n ^oo a n \/n — k = 



x 



lim n _ 



na n 



/ n—k 

lim n ^ 00 a n (logn) 2+ = for some positive 5 

n—k 

L n — >-oo 



lim™ ^ = 

In this section we obtain a close approximation for p ^Xf = Yj* = <rj for 
k = k n where a n and k satisfy the following set of conditions. 
The value of — satisfies a n < o < a„ + c„ with 



(C) 



limn^oo na n c n = oo 
lim — ^= = 

lim «— S^feJ = 
lim cxp -nn„e„ _ n 



We denote (A1),...,(A4) , (C1),...,(C4) the above conditions. 

It appears clearly from (O that the optimal choice g = px™ /s™>na„ need only 
to hold on paths Y™ sampled under g and not on all M™. In a similar way the 
approximation of the optimal density need to be realized only when evaluated on 
samples Y™(1) generated according to this approximation, and approximation of 
Px™ /£ on the entire space K" is not needed. The approximation of p n by such a 
density g n is difficult to obtain on realizations under g n and much easier under 
Px"/s™>na„- The following Lemma proves that approximating p„ by g n under 
p n is similar to approximating p„ by g n under g n . 

Let 9\ n and & n denote two p.m's on R n with respective densities t„ and s n . 

Lemma 5 Suppose that for some sequence e n which tends to as n tends to 
infinity 

r„ (17) = fi„ (Y?) (1 + o* n ( £ „)) (15) 

as n tends to oo. TTien 

Sn (YD = t„ Q?) (1 + o e „(e„)) . (16) 

Proof. Denote 

A n , £n := {y™ : (1 - e n )s n (y?) < t„ (y?) < s n (j/f ) (1 + £„)} . 



It holds for all positive S 
where 



lim 7(n, 6) = 1 
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Since 

I(n,5) < (l + 5e n )e n (A n , Se J 

it follows that 

lim & n (A nt g Sn ) = 1, 

n — >oo 

which proves the claim. ■ 

This shows that the approximation of p„ need not to be achieved on the 
whole space R™ but only on typical paths under the conditionning event £ n . It 
appears that such a sharp approximation is possible on quite long portions 
of sample paths generated under ^p„, when k tends to oo together with n and 
k/n goes to 1. 

Let a such that a n < a < b n with b n — a n small enough. We prove that the 
sequence of conditional densities p (Xj = 5^ /S™ = no) is closely approximated 
by a sequence of suitably modified tilted densities when evaluated at a 
realization under the density p„. This is the scope of Proposition [5] hereunder. 
The size of b n — a n is such that p n [Y{ ) can be substituted by an integral of 
p (X* = Yj/Si — no) with respect to the distribution of S" conditionally on 
(S™ G (na n , nb n )) . This is the scope of Proposition [T5l 

Define Ej := Y\ +... + Yi and ti, n through 

m(U, n ) = rriin := ■ [a (17) 

n — i \ n J 

4« := ^2 Oog^-H.nexptX 1 )(0) 



and 



^ -^(log^^exptXxJCO) 



which are the variance and the kurtosis of 7r mi '" , reflecting the corresponding 
characteristics of p, since ti >n is close to as shown in the following result. 

Lemma 6 Let a belong to [a n ,b n ) and assume that (A) holds together with 
(C2) and (C3). Then under ty n , ti^ n tends to 0, sf n tends to 1 and fJ^' n ^ tends 
to the third centered moment of p uniformly upon a in (a n ,b n ). 

Proof. Write 

n , n ( £' 



m(t it n) = :{o- - On) + 



which goes to under uniformly upon a under (C2) and (C3) where we used 
Lemma ??; therefore i,- )TJ goes to uniformly in a which concludes the proof. ■ 

The following density ga{ll\) defined in ([20]) on R fc provides the sharp ap- 
proximation of p(X.i = Ui/S" = no). This density is defined on K fe as a product 
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of conditional densities which are set in the following displays. It only approx- 
imates p(X-i = ?/i/S™ = no) on typical vectors y\ which are realizations of 
under £ n . Chose any density .90(2/1) (for convenience denoted go (2/1/2/0) m 
(T2"0)) .and for 1 < i < k— 1 define recursively the sequence of conditional densities 
9i{yi+\/y\) through 

5o(2/i) = 7r CT (2/i) 

and 

exp (y i+1 (t itn + 2s ij n _ l _ 1) ) -Vi+J ( 2 Si,n( n_i_ *)) J i>(fH-i) 

9i{yi+i/y{) = — — ~im~\ 

a density on R , with ti jU the unique solution of the equation 



(18) 



m(t i:7 

where s\ := y± + ... + yt. The normalizing factor Ki(y\) is 

Ki{yi) = J exp {x [t i>n + 2g4 - x2 l ( 2s ln (n-i-1))^ p(x)dx. 

(19) 

Define g a the density on R fc through 

fc-i 

gM)-=J{9i{yi + i/y\)- (20) 

i=0 

The definition in. (fT8"|) can also be stated as 

9i+x{yi+x/x\) = Cip(y i+1 )n(ab,a,y i+1 ) 
where n (/i, a 2 , x) is the normal density with mean /x and variance a 2 at x. Here 

a = s 2 n (n-i-1) 

(i,n) 



2<„ (n-i-l) 

and the constant Ci is 1 . This form is appropriate for the simulation. 

The density gi{yi+i/y\) is a slight modification from 71-™^ ™). It approxi- 
mates sharply p(X,; + i = yj + i/S™ = na, y\j . For small values of i, the contribu- 
tion of Vi+i 2s j fln-i-i) and of yt+il ( 2s ln (n - i - 1)) is small and gi{y l+ i/y\) 
fits nearly with 7r a " (j/^+i) , when a is close to a„, which is in accordance both 
with Diaconis and Freedman's approximation when translated in the moderate 
deviation range and with Ermakov's IS scheme. 
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Remark 7 When the Xi 's are i.i.d. normal then gi{yi+i/y\) = p(yi+i/y\, — = 
o) for all i. 

We then have 

Proposition 8 Set o with a n < o < b n and assume (A) together with (C2) 
and (C3). Let Y™ be a sample with distribution *p„. Then uniformly upon o 

p(Xj - = no) = g a (Yf)(l + o«p„(a n (logn) 2+5 )). (21) 

Proof. The proof uses a Bayes formula to write p(Xj = Y^/S™ = no) as a 
product of k conditional densities of individual terms of the trajectory evaluated 
at Yi , and the invariance property stated in Lemma|4] Each term of this product 
is approximated through an Edgeworth expansion which together with the three 
preceeding lemmas, conclude the proof. It holds 

p(X{ = Y 1 k /S1 = no-)=p(X 1 =Y 1 /S^=na) (22) 

fc-i 

l[p(X l+1 = Y i+1 /X\ = Yi,S? = no-) (23) 
i=l 

fc-1 

= l[p (x m = r m /s? +1 = no - si) 

i=0 

by the independence of the r.v's X;; we have set S\ := 0. By Lemma |4] 

p (x i+1 = y J+ i/s^ +1 - m - Ei) 

= tt" 1 - (X +1 = Y i+1 /S? +1 = no - El) 

"(Sf +2 = n ( 7-El +1 ) 



( s r+i = na - si) 



where we used Bayes formula and the independence of the X^-'s under ir mi - n . A 
precise evaluation of the dominating terms in this lattest expression is needed 
in order to handle the product (f2"2")) . 

Under the sequence of densities n mi - n the i.i.d. r.v's Xj+i, X n define a 
triangular array which satisfies a local central limit theorem, and an Edgeworth 
expansion. Under 7r mi -", X^+i has expectation m^ n and variance s| n . Center 
and normalize both the numerator and denominator in the fraction which ap- 
pears in the last display. Denote 7r w _l£_i the density of the normalized partial 
sum (S™ +2 — (n — i — l)mj )n ) / (si, n \/n — i — l) when the summands are i.i.d. 

with common density 7r mi >". Hence, evaluating both 7r n J.'^Li and its normal ap- 
proximation at point Yi + i, 

p (X i+1 - y <+ i/S? +1 =no- El) (24) 
Vn~« TOi „ , Y v ^ C-i-i - /si, n y/n-i- l) 



Vn-t-1 tt^O 
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The sequence of densities 7r™2'-1i converges pointwise to the standard normal 
density under the assumptions, when n — i tends to infinity, i.e. when n — k n 
tends to infinity, and an Edgeworth expansion to the order 5 is performed for 
the numerator and the denominator. 

Set Zi+i := (rrii in — i^+i) /si.nV" —7—1. Using Lemma [Ml we have 



m i,n — Y i+ i — a n — Y i+ i + 



It then holds 



"n— i— 1 = "(-^i+l) 



1 + ^=T^(^+l) + izb^(^ + l) 

+ (^- 1 T)^ p 5(^+i) 



(25) 



(26) 



(n-i- 1) 



3/2 



We perform an expansion in n(Zi+i) up to the order 3, with a first order term 
n (— li+i/ (s»,nV« — « - l)) - namely 



n(Z 



) = n (-Y i+1 / (si, n \/n- « - l)) 



I (n-i-1) ' 2s' 2 (n-i-1) 
\ T 64„(n- l -l) 3 ' 2 n(-Y I + 1 /( Si ,„\AT=i^T)y / 



(27) 



(28) 



where Y* = - — -^===== (— y i+1 + 9mi >n ) with |0| < 1. Only the first order term 

is relevant when handling the conditional density of the sub trajectory Y*. 
Write 



n — i — 1 



n — i — 1 



(cr - o«) 



and use Lemmas ?? and [26] to obtain 

l^+im^l Q«p„(logfc) ( Mn ( 1 , ri vw 9Q x 

-n(--a n ) °^ 0gk h l + o^(l)) 



s ln (n-i-1) 



(n-i-iy 



and 



1 



s ln 



n — i — 1 

.,2 



a„ + 0<p„ 



1 



\Jn — i 



(l + o Vn (l)) (30) 



(n-i - 1)' 



■ (a - a n f (1 + o v Jl)) 



+2 



n (a - a n ) 
(n — i — l) 2 



a n + Oqj, 



1 

\Jn — i 



(l + o<p„(l)). 
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where the 1 + o<p„(l) terms stem from the convergence of sf n to 1 by Lemma 
[6j Assuming (C2) it follows that 

\Y i+1 m i<n \ _ 0<p„(logfc) 



and 



a n + Vn -== (l + o<p„(l)) 



s ln (n-i-l) n-i-l \ 
which yields 

n(Z i+1 ) = n(-y i+1 /(vVn-i-l)) (l + 0<p„ (^^)) • (31) 

The Hermite polynomials depend upon the moments of the underlying den- 
sity 7r mi '™. Since 7T™ 1 '™ has expectation and variance 1 the terms correspond- 
ing to Pi and P2 vanish. Up to the order 4 the polynomials write P%(x) = 

^{x* - 3x), P 4 (x) = j^ix* - 3x) + (x* + 6. 2 - 3). 

In order to obtain a development of the polynomial bracket in (|26p in terms 
of powers of (n — i) only the term in x from P3 and the constant term from P4 
are relevant. It holds 

P 3 (Z t+1 ) _ fj} 3 ' fj> 3 ' ' n(a-a n ) 



(a n — Yi + i) — 



Vn-i-1 2 fl * n (n-i-l) w ' "™ 24„ (n-i-l) 



6( Sl ,„) 6 (n-z-l) 2 +0,? " I (n-z) 3/2 



When (C3) holds then 



? {Zi+1 \ = - „ 4 ^ ^1 (32) 

u {i ' n) ( 1 \ /I 



2sl n (n-i-iy^ \{n-if\ 
For the term of order 4 it holds 



n-i-l n-i-l \l2s? n v ^ y 24sf; n (n - i - 1) 
When (C2) and (C3) hold it follows that 



(33) 



n-i-l 8a4 n (n-t-ir ^ (n - ? - 1) 3/2 , 



14 



The fifth term in the expansion plays no role in the asymptotics, under (A). To 
sum up and using (A) and Lemma [26] we get 



^n-i-l ( Z i+l) = n (- Y i+l/ (si,nVn - i - lj^j 



„(*,») 

A 1 3 V 

\ 1 1- 



( I . 

1 ~r 2aJ_ B (n-t-l) 1+1 

(34) 

Turn back to (|24[) and do the same Edgeworth expansion in the demomina- 
tor, which writes 



C-!f(0)=n(0) 1 



(i,n) _ 4 1 

8<> - 



O 



1 



. ( n - i) 



.3/2 



(35) 



Summarizing and using both (|32[) and (|33|) we obtain 
p(X i+1 =Y i+1 /S"+i=^-£") 



(36) 



yjn — i 
V 'n — i — 1 



exp y i+ i , 



(i,n) 
M 3 



2*?.« (n - i - 1) 



(2*?, n (n - * - 1)) 



1 + 0<p r , 



a„ log n 



The term exp — y^^ 1 /2s? n (n— i— 1) in ^(ii+i/Yj) comes from the ratio of the 
two gaussian densities n(Z i+ i) and n(0). Taking logarithms and using standard 

calculus provides the result in (|18p : indeed the constant term — gjl — in 
(|33[) combines with the corresponding one in (I35|) to produce a term of order 



O 



whose sum is O- 



We now prove that Ki as defined in (fill)) satisfies 

1 



i^) =0(i i>n ) ^1 

This will conclude the proof. 
Use the classical bounds 



2(n-i- 1) 



O 



(n — i) 



3/2 



(37) 



l- M+T - ¥ <e-«<l- U+y 

to obtain on both sides of the above inequalities the second order approximation 
of Ki(Yi). The upper bound is 



Ki(Yl) < <f>(ti,n) + 



(i,n) 



2*?.„ (« - * - 1) 



<t>' (ti,n) + —^—T~7 



(i,n)2 



(2)»<„ {n -i-lf 



0" (*i,n) 



1 

2^> - i - 1) 



0" (ti,, 



(i,n) 

2*L (n - * - 1) 



(3) (^,«) 
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The lower bound is the same up to order 2 and the third order term plays no 
role. 

Use Lemma |B] to conclude, making a Taylor expansion in 4>(ti,n) , 4>' (U,n) 
and cf>" (ti :n ) . The dominating terms are due to (j> (t^n) arid — ^-i-i) ^" 
which yield the 1 — 2 ( n -i-i) term in (|37|) . The other terms are indeed 




using Lemma [Ml leading to ([57)) . Hence (??) writes as 

© = ^(Yi+i/y?) (1 + o Vn (^^)) ■ 

Putting the pieces together yields under (A) 

k 

p(Xf = 1?/S? = na) = (l + <p„ (a n (logn) 2+5 )) 5i (Y i+1 /*7). 

Uniformity upon a is a consequence of Lemma [5] This closes the proof of the 
Proposition. ■ 

Remark 9 When the Xi 's are i.i.d. normal, then the result in the above Propo- 
sition holds with k = n stating that p(X™ = .t™/S™ = na) = g a (x™) for all x" 
m M™ . 

Remark 10 The density in is a slight modification of 7r" li " . However 

second order terms are required here in order to handle the approximation of the 
density of ~X-i+i conditioned upon andS^/n. The modification from 7r mi " to 
gi is a small shift in the location parameter, which reflects the asymmetry of the 
underlying distribution p, and a change in the variance : large values of Xj+i 
have smaller weight for large i, which is to say that the distribution of Xj+i 
tends to concentrate around m,i_ n as i approaches k. 

Remark 11 The "moderate deviation" case is typically a n — n~ T , for r in 
(0,1/2). In this case the condition a n (logn) 2+(5 — > holds for all values of r. 
The other case is when a n is "nearly constant", in the range a n = (logn) 7 ,7 < 
2, decreasing very slowly to 0, with 7 > 2 + 8, S > 0. 

Remark 12 In Lemmas 11 and \27\ , as in the previous Proposition, we use 
an Edgeworth expansion for the density of the normalized sum of the n— th row 
of some triangular array of row-wise independent r.v's with common density. 
Consider the i.i.d. r.v's Xi,...,X ra with common density Tr a (x) where a may 
depend on n but remains bounded. The Edgeworth expansion pertaining to tt" 
can be derived following closely the proof given for example in \f$, pp 532 and 
followings substituting the cumulants of p by those of 7r CT . Denote <p a {z) the 
characteristic function of Tr a (x). Clearly for any S > there exists q a> g < 1 
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such that \ip a (z)\ < q a _s and since a n is bounded, sup„ q a ,s < 1- Therefore the 
inequality (2.5) in p533 holds. With ijj n defined as in (2.6) holds with 
if replaced by ip a and a by s(t a ); (2.9) holds, which completes the proof of the 
Edgeworth expansion in the simple case. The proof goes in the same way for 
higher order expansions. This justifies our argument in the Lemmas cited above. 
In the proofs of Proposition^ we made use of such expansions when the r.v's 
Xj_|_i, X n are i.i.d. with common density n mi - n (x).The same argument as 
sketched hereabove applies in this case also. 



0.2.3 Conditioning on final events E n 

Let T := S"/n with distribution under the conditioning event £ n . Hence for 
any Borel set A 

/Si \ 

P(TeA)=% n Uei . (38) 



The distribution of T is concentrated on a small neighborhood of a n . Indeed we 
have 

Lemma 13 Assume that (Al) holds. For any sequence c n such that (CI) holds, 

P (a n < T <a n + Cn) = 1 + O (exp -na n c n ) . 

Proof. Use Lemma[3[ ■ 

Moreover T is asymptotically exponentially distributed. The asymptotic 
distribution of T is captured in the following 

Lemma 14 When (Al) holds then for all u in R + the r.v. Z :=nt an (T—a n ) 
satisfies 

Pz (u) = e-" (1 + o(l)) 
where m(t a " n ) = a n and therefore T = a n + Op (^-^~ 

Proof. Write 

, v _ 1 PS„/n (On + u/ jnt a ")) 

PZ (U) - ^ P(S n /n>a n ) 

and use Lemmas [5^nd [3J A first order expansion yields a n — m(t a ") — 
t a " (1 + o(l)) which proves the claim. ■ 

In this Section Proposition [8] is extended in order to provide an approxima- 
tion of pniXi) when Yy is a random vector generated under p n . This is obtained 
through an integration w.r.t. a in (|2T|) : indeed it holds 

(39) 



Pn&i) ■= r P (X{ = Yf/T = a) PT (a) da 
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and the domain of integration can be reduced to a small neighborhood of a n 
which contains nearly all the realizations of T under £ n . This argument allows 
the interchange of asymptotic equivalents and integration. 
Define 

9n{x\) :-- 



/ 9a{x\)pT (a) da 



where g a (x) is defined in ([2"0]) . 

When g a is substituted by g n then 

p n (Y 1 k )=g n (Y 1 k )(l + o Vn (l)) 

does not stand. 
Let 

where c„ is fitted compatibly with Proposition [5] 
Define 

gM:= PfriMO) ■ (40) 

Proposition 15 When Y k is a random vector generated with density p n and 
(A) and (C) hold then 

p n {Y?) = g^{Y k ) (l + o Vn (a n (logn) 2+5 )) . (41) 

The proof of Proposition [15] relics upon the following Lemma, whose proof 
is postponed to the Appendix. 

Lemma 16 Let b n satisfy b n — a„ + c n and (A) and (C) hold then when Y™ 
is generated under p n it holds 

Pn (>i ) = / P {Yi/T = a) p (T = a) da (1 + 0<p„ (exp -na n c n )) . 

We now prove Proposition [15] through an integration of the local approxi- 
mation given in Proposition [8] 
For all a in (a n , b n ) 

g a (Yf) = p (K*/T = a) (l + o Vn (a n (log n) 2+5 )) (42) 

uniformly on a when Y™ is sampled under p n . It then holds 

9n[l) ■ - P(a n <T<b n ) 

= I* p (y*/T = a) p(T - a)da (l + o<p„(a n (logn) 2+5 )) 

= p^Y*) (l + <*p n (a n (logn) 2+<5 )) 
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where we used Lemmas [Altogether with (C4) which helps to keep the otp n (a n (log 
term. This concludes the proof of Proposition [T5l 

As a consequence of Lemma [5] the following result holds, which asseses that 
when sampled under the likelihood of the random vector X\ approximates 

Proposition 17 Assume (A) and (C). Let X± be a random vector with p.m. 
G n with density on M. k defined in |^fl| ). It holds 

g^(X*) = p n (Xf) (l + o^(a n (logn) 2+5 )) 

as n — > oo. 



0.3 The Adaptive Twisted Importance Sampling 
scheme 

The last result in Proposition [17] above suggests that an Importance Sampling 
density deduced from would benefit from some optimality as defined in the 
Introduction since it fits with the conditional density on long runs. It is enough 
to approximate the conditional distribution of T = S"/n under £ n by Lemma 
[Ml and to plug in this approximation in (|40]) . 

Let E denote a r.v. with exponential distribution with parameter na n on 
(a n , +oo) 

. na n ft — ct«l -i 

l (a„, + oo)l 

Using again Lemmas [2] and [3] it is easily checked that 



PE (t) := na„e-" a "( t - a ")l (a ^ +oo) (i). (43) 



p(E = s) 
sup — = 1 + o(e n ) 

na n <s<nb n P\± = S) 



for some sequence e„ whinch tends to 0, from which 

f£ g a (Xf) p(E = a )da 
frp(V = a)do- 



{xi) ;= j^^-^ = p „ (jff ) (1 + ^ (44 ) 



with linin—xx) £^ = 0, which proves that we may substitute T by the exponential 
r.v. E while keeping the properties of the IS procedure. We denote 

n 

g«):=g(^) 11 7r°*(*0 (45) 

i=k+l 

the sampling scheme under which the estimate ([5]) is computed; in (|45|) the 
value of «fc is defined through 

a k := m(tk) 
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with 



m(tk) = 



- k 



0.3.1 The Adaptive Twisted IS algorithm 

Since the r.v. E is highly concentrated in a small neighborhood of a n we suggest 
to forget about b n in the definition (|45[) of g above and to integrate on (a n ,oo) 
instead of (a n , b n ). Numerical experiments argue in favor of this heuristic. The 
remarks at the end of this paragraph provide simple and efficient solutions for 
the effective calculation of the estimate. 

1- Draw M independent random variables E 1 ..., E M with distribution 
(|4"5|) and define the density on R n 

M / n \ 

S«) := m E \ 9M4) II (46) 

m=l \ i=k+l } 

where j^m is defined as 

fc-i 

9eAx\) ■= II gi+i(x i+1 /x\) (47) 

where gi+\(xi+i/x\) is defined in (TT8|) for i > 1 , 50(^1) = 7r a ™(xi) and 

^ k (x):= e -^p(x) (48) 
where a& = ra{tk) and t& is the only solution of the equation 

m(t k ) = — [a n -^ (49) 

with Si :— x\ + ... + Xk, with = 0. 

2- Define L which is the number of replications of the simulated random 
trajectory to be performed 

3- For I between 1 and L do 

{ 

draw a random variable E(l) with distribution 

draw the first k variables X* (I) recursively with density gE(i)( x i) as 
defined in gT]) with E m substituted by E(l). 

Draw the n—k random variables XJ} +1 (1) independently with common 
density n ak (x) defined in dHJ with E m substituted by E(l). 
} 

4- Define 
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where 



1 £ „(Z) :=l ( „„,oo) mi)/n) 



(51) 



Some remarks for the implementation of the algorithm 

A number of remarks hereunder show that ATIS is not difficult to implement. 
Since the first order efficiency of i.i.d sample schemes is reached if and only if 
the sampling distribution is the twisted one with parameter a n (see [8]), the 
present algorithm should be compared with it. The classical IS scheme which 
uses i.i.d. replicates with density 7r°™ is easy to implement but may lead to 
biased estimates of P n ; the simulation of a r.v. with density 7r a " is difficult in 
non standard cases When p is easy to simulate then an Acceptance/Rejection 
algorithm can be used; however this requires to truncate the support of p, what 
should precisely be avoided in order to obtain unbiaised estimates; see [2]. When 
7r a '» is easy to simulate, ATIS may take more time to run, due to the various 
intermediate calculations which are required at each stage of the algorithm. 

The generation of the r.v. (I) above is easy and fast and does not require 
any simulation according to a twisted density. It holds 

g i+ i(x i+1 /x\) =Cip(x i+1 )n(ab,a,x i+ i) (52) 
where n [u, a 2 , x) is the normal density with mean /z and variance a 2 at x. Here 

a = s ln {n-i-1) 

(i,n) 

A r.v. Y with density g(x) — Cp(x)n(x), with C = (f p(x)n(x)dx) 1 and 
where p is a given density and n(x) — fi (/i, a 2 , x) is easy to simulate: De- 
note 9T the c.d.f. with density n [a, a 2 , x) . It is easily checked that g(x) is 
the density of the r.v. Y := 71^ (X) where X is a r.v. on [0,1] with den- 
sity h(u) := jjp (91^ (u)) ;^^ denotes the reciprocal function of 91 . Now an 
acceptance/rejection algorithm provides a realisation of X. Indeed let f(x) 
be a density such that p(9T _ (u)) < Kf(x) for some constant K and all x 
in [0,1]; Let V be uniformly distributed on the hypograph of Kf, namely 
V := (X-p,Yp = KU f (X-p)) where X-p has density / and U is uniform [0, 1] 
independent of X-p. When Yp is less than p (%V~(X-p)) then Xp has density h. 

The calculation of g(Xi(l)) above requires the value of Cj = (J p(x)n (ab, a, x) dx) 
in ([52|) . A Monte Carlo technique can be used: simulate N i.i.d. r.v's Zj with 

density n (ab, a, .), which is fast, and substitute C,; by Ci :— (jf YljLi P(^j)j > 
which provides a very accurate approximation to be inserted in the calculation 
of the estimate. 

It may seem that this algorithm requires to solve Lk equations of the form 
m(t) = \ a ~ltl m or der to obtain the t^ n which are necessary to perform 
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the simulation of X^(l) as described above as well as the calculation of g(x"). 
Such is is not the case, and only L equations have to be solved. Consider for 
example the simulation of X±(l) with density gE(l){ x i)- This is achieved as 
follows: 

1- Solve the equation 

m(t) = E(l) 

whose solution is £o,n- Generate Xq(1) according to n Elyl \ 

2- Since 

m(t l+1 . n ) - m(tj „) = : (m(ti. n ) + X^l)) 

n — i 

use a first order approximation to derive 

U+l,n — ti.n — 7 r; — 77 r {jn{t% „) + XAl)) 

[n-i) s(t it n) 

from which (f52"| is derived and Xi + i(l) can be simulated as mentioned above. 
In the moderate deviation scale the function s 2 (.) does not vary from 1 and the 
above approximation is fair. 

Remark 18 The density g(x) on 1" is a Monte Carlo approximation of g n 
defined by 

g n (x) := J ga{x)p (T = a) da 

where p (T = a) is replaced by p(E —a) and the integral is replaced by a fi- 
nite mixture. M is a free parameter. Also notice that the n — k i.i.d. r.v's 
have common tilted density 7r Qfe (x) with parameter given by j49\ ), thus identi- 
cal to Ermakov's sampling scheme with end point in(^a n — ^-,00] , and not in 

(m(i fc _i) - 2l |0 o) . 



0.3.2 The choice of the tuning parameters 
Choosing k 

The critical parameter k is the length of the partial sum run which is to 
be simulated according to the density g(xj) as defined in (|44|) . By (|44|) it 
would be enough to establish some statistics averaging the estimate ratios 
g (xi^j /p n on a set of runs , and to select k as some j ensuring that this 

ratio keeps close to 1. In the case when the r.v's X$ are normally distributed 
the density gi(yi+i/y[) as defined in (TTg| coincides with p(yi+i/yi, = a) for 
all value of i between 1 and n—1 which entails that k can be set equal to n—1. 
This very peculiar case is illustrated in Figure 1, for n = 100, and P n is close to 
0.01. We can see that ATIS produces a very sharp estimate of P n for a small 
value of L when compared to the classical IS scheme. 
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In the other cases, when gi{y i+1 /y\) approximates p{yi+i/yi, %^ = a) only 
under some conditions on k as described in Conditions (A), we propose the 
following heuristics, which works well and is easy to implement; other choices 
are possible, which provide similar acceptable results. Instead of g consider the 
following construction, which will also be used in the IS algorithm: simulate 
E l ...,E M , i.i.d. with distribution P3")) and define the density on W +1 



gOo) : = Jj J2 9E^(x{)n Em (x Q ) 

m— 1 

where qe™ is defined as 

i=l 

where gi + i(x i+1 /x\) is defined in (fT8j) for i > 1. The density g(xg) is a Monte 

Carlo approximation of g (%o) ■ 

By (f3"5|) and following the same heuristics as for g define , with a new set of 
i.i.d. E m, s 

1 M 

771=1 

We use Lemma [5] in order to obtain an explicit approximation for It holds 



[X -x /T-E j v (S? = nE<") P \ X ° 



p(S™ = nE m ) 
n exp-nI(E m ) 



P (n=4) (i+ (i)) 



Define therefore 



/ \ 17, 7c*p-(n- j)I (— (E m - ^) . 



and 



m— 1 

Fix some integer L which is the number of runs to be simulated in order to fix 
k; L need not be large. For all I between and L draw independently a random 
variable E with density (|43[) and the run Xq(1) with density g E i defined as in 
(|2"D|) with k substituted by j and a by E l . 

Fix fc as the smallest j which indicates a departure of this statistics from 1. 
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The choice of M 



In ATIS the distribution in (|44[) is substituted by a numerical approximation of 



which is suboptimal with respect to (|44[) but is easily implemented. A Monte 
Carlo procedure produces g(x") as described above in (|4"6")l . It appears that M 
should be large when k is large. For example in the normal case with n = 100, 
for k = 60, then M = 30 produces excellent estimates for values of L of order 
5000, whereas for k = 98, the value of M should increase up to 2000, with 
the same L .as seen in Figure2. The reason for this increase in M is that (|53p 
is a mixture of densities in very high dimension, which seems very sensitive 
with respect to the approximation of the mixture measure. This point should 
deserve a specific study, out of the scope of the present paper. However the 
normal case is quite specific, since it allows k to be as close to n as wanted. In 
the other cases, as examplified in the figures pertaining to the exponential case, 
k is resticted to lower values and M is rather low. 

0.3.3 Asymptotic efficiency of the adaptive twisted IS scheme 

The evaluation of the performances of IS algorithm is a controversal argument. 
Many criterions are at hand, for example the probability of hits which counts 
the relative number of simulations hitting the target (a„,oo) , or the variance 
of the estimator. We refer to the book by Bucklew [4] for a discussion on the 
relative merits of each approach. 

The variance of an IS estimate of P n under the sampling density g writes 



The situation which we face with our proposal lacks the possibility to provide 
an order of magnitude of the variance our our IS estimate, since the properties 
necessary to define it have been obtained only on typical paths under the sam- 
pling density g defined in (|4"S"j) and not on the whole space R ra (but in the case 
when the X^s are normally distributed). We will prove , however, that the per- 
formance of this new procedure can be considered favorably. Not surprisingly 
the loss of performance with respect to the optimal sampling density px™ /£„ is 
due to the n — k last i.i.d. simulations, leading a quasi- MSE of the estimate 
proportional to \fn — k. 

In order to discuss this we first go back to the classical IS scheme, for which 
we evaluate the asymptotic variance. 




(53) 



VarP^{£) = -(E g {P g {l)?-Pl) 



with 



PS) ■■ 



pQT(O) 
(0) 



U n (£? (0) • 
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The variance of the classical IS scheme and a discussion on efficiency 

The asymptotic variance of the estimate of P(£ n ) can be evaluated as follows. 

The classical IS is defined simulating L times a random sample of n i.i.d. 
r.v's Xi(j), 1 < j < L, with tilted density 7r a ™. The standard IS estimate is 
defined through 

where the Xi{l) are i.i.d. with density 7r a " and le„(l) is as in (|5"Tj) . Set 

p-m-i m nr = iP(^i(0) 

Pn(0 - l£ " (,) nr=i^(^(0)- 

The variance of P n is given by 

The relative accuracy of the estimate P^ 5 is defined through 
It holds 

Proposition 19 The relative accuracy of the estimate is given by 

RE{P n ) = a„(l + o(l)) as n tends to infinity. 

Proof. It holds, omitting the index I for brevity and noting a for a n 



7T a (Xf), 
poo 

cj} n (t a )exp~nat a / exp ~t a (s - na) ps n (s)ds. 



The Laplace integral above satisfies 

exp-i a (s - na)ps n {s)ds = P„(l + o(l)) 
as n tends to infinity, which, together with the expansion 



4> n {t a ) exp-nat a = P„vW2^rf Q (l + o(l)) 

(which holds when lirrin^oo ay 7 " = 00) concludes the proof. We have used 
Lemma [6] to assess that lim n ^oo s(t a ) = 1. ■ 
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We now come to a discussion of the above result. It is well known that 
the variance is not a satisfactory criterion to describe the variability of the out- 
comes of a random phenomenon: for example, a sequence of symmetric r.v's X„ 
taking values — exp exp n, 0, expexp n with relative frequencies defined through 
P(X„ = exp n) = exp — n has variance going to oo while being concentrated at 0. 
In this case we can define an increasing family of sets B n with P(X„ £ B n ) — > 1 
on which E (ls n X^) = 0, a much better indicator, obtained through trim- 
ming. We will prove that such an indicator cannot be defined for the classical 
IS scheme, stating therefore that the variance rate obtained in Proposition [T91 
is indeed meaningful! 

The easy case when Xi, ...,X„ are i.i.d. with standard normal distribution 
is sufficient for our need. 

The variance of the IS estimate is proportional to 

V : = £pl( na „,oo) (Si) J an ^ j - P% 

2 



E p l (nan}00) (SJ) (exp ^\ (exp-a n S?) - P, 



A set B n resulting as reducing the MSE should penalize large values of — S" 
while bearing nearly all the realizations of S™ under the i.i.d. sampling scheme 
7r a " as n tends to infinity. It should therefore be of the form (nb n , oo) for some 
b n so that 
(a) 

and 

(b) 



lim E„* n l (nbn<oo) (SJ) = 1 



£/ p- L (no n ,oo)n(n6„,oo) l°l I ^ an /x" 1 ) 

lim sup — — < 1 

n — >oo V 

which means that the IS sampling density 7r a " can lead a MSE defined by 

MSE(B n ) := £'pl( n a n ,oo)n(nb„,oo) a ~ P n 

with a clear gain over the variance indicator. However when b n < a n (b) does 
not hold and when b„ > a n (a) does not hold. 

So no reduction of this variance can be obtained by taking into account the 
properties of the typical paths generated under the sampling density: a reduction 
of the variance is possible only by conditioning on " small" subsets of the sample 
paths space. On no classes of subsets of R™ with probability going to 1 under 
the sampling is it possible to reduce the variability of the estimate, whose rate 
is definitely proportional to -y/n, imposing a burden of order L^/na in order to 
achieve a relative efficiency of a% with respect to P n . 



26 



The MSE of our estimate on a growing class of typical paths 



We will evaluate the performance of our estimate under g since the algorithm 
envolves technical parameters (typically M); in practice the Monte Carlo ap- 
proximation introduces no significant bias. 

At the contrary to just evidenced hereabove, the procedure which we propose 
has a small asymptotic variability when evaluated through trimming on classes 
of subsets of K." whose probability goes to 1 under the sampling g . These 
subsets of K n get smaller and smaller as n increases as measured through the 
MSE of the estimate with respect to the MSE of the classical IS estimate. 

We prove the existence of these trimming sets in the present section and 
state that the resulting gain in terms of the MSE of our estimate is the proper 
measure of its performance. 

These sets are the C n described in the following Lemma, whose proof is 
differed to the appendix. For sake of notational simplicity denote e n the e' n 
defined in 

Lemma 20 With the just mentioned e„, define the family of sets C n in R™ such 
that for all x™ in C n , 



and 



where t k is defined through 



g (xf) 
m(t k ) 



m(tk) 



and 5 n satisfies 
together with 
Then 

Furthermore on C n 



1 < e. 



< 5n 



n ( s\ 
— a„ 

n — k \ n 



lim S n = 

n — >oo 



lim 5 n a n V n — k = oo. 

n — >oo 

lim G {C n ) = 1. 

n — >oo 

tks(t k ) = a n (1 + o(l)) . 



(54) 



We now prove that our IS algorithm provides a net improvement over the 
classical IS scheme in terms of Mean Square Error when evaluated on this family 
of sets. 

Define 

^ g (l c „^(0) 2 



RE 



V 



p2 
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Pn(l) : 



gW(0) 



We prove that 

Proposition 21 The relative accuracy of the estimate P„ is given by 



Proof. Denote E*f$ n the expectation with respect to the p.m. <p„ of X™(1) 
conditioned upon £ n (l) := (5™(Z)/n> a„); we omit the index I for brevity. 
Using the definition of C n we get 



The second line uses A* . The third line is Bayes formula. The fourth line 
is Lemma [3J The fifth line uses (|54l) and uniformity in Lemma [3J where the 
conditions in Corollary 6.1.4 of Jensen (1995) are easily checked since, in his 
notation, J (6) = R , condition (i) holds for 6 in a neighborhood of (0o 
undeed is resticted to such a set in our case), (ii) clearly holds and (iii) is (J9j> . 
■ 

Proposition 22 When a n — n^ 1 then under (A) the ratio of the relative effi- 
ciencies of the Adaptive IS algorithm with respect to the standard IS scheme is 
of order \Jn — k/y/n.. The same result holds when a n = (logn) a 

0.4 Importance Sampling for M-estimators 

This Section provides some application of the previous results for some classical 
types of estimators for which sharp moderate deviation probabilities can be 
obtained through linear approximations. We follow closely the work by [8]; see 
also pp. 

Let T denote a real valued statistical functional defined on the space Mp, 
where we assume that T has an Influence Function. Let P be a given p.m. We 
assume that for all Q in Mp there exists a function g (depending on P) such 
that 



RE{P n ) 



V^TrV n — k — 1 
L 



a n (X + o(l)) as n tends to infinity. 





(55) 
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where TV is a seminorm defined on Mp , continuous in the Tp topology, and 
w is a continuous and strictly monotone function which satisfies uj(t)/t — > as 
i->0. 

The function g is the Influence Function of T at P. The class F considered 
here contains S(R)U {g} . 

Let ip(x,t) be real valued function defined on R 2 and assume that P satisfies 
f \ip(x,i)\ dP < oo. Define T(P) as any solution of the equation 

J 4>{x,t)dP = (56) 

if defined. When P — P t depends upon a real valued parameter t such that 

T(Pt) = t 

then T is Fisher consistent and the substitution of Pt by P n in l]56p. the empirical 
measure pertaining to an i.i.d. sample with unknown p.m. Pt provides a 
consistent estimate of to under appropriate regularity conditions; see [Hj. Such 
estimate is an M-estimator. We assume that all conditions Ml to M5 in |8J 
hold, which implies that ([53)) above holds (see [5] Theorem 4.2). Also in this 
case the function g writes 

The same situation holds for L-estimators, 

When (|TU)l holds in Mp it can be checked that a strong MDP holds for 
T(P n ); Indeed when g belongs to the class F and 

lim (na 2 n ) 1 log [nP to {\g(X{)\ > na n )] = -oo 

n — >oo 

then using (|55|) and p0|) it can be proved that the remaining term in T{P n ) — 
T(P ta ) is negligible w.r.t. the linear approximation J g(x,to)dP n on the moder- 
ate deviation scale, as follows from (2.14) and (2.15) in [S]. Furthermore in this 
case the strong moderate deviation holds for P to (\T(P n ) — T (Pt )\ > a n ) and 

r Pto (T(P n )-T(P t0 )>a n ) 
in the range a n = n~ a , | < a < ^; . see also Inglot, Kallenberg and Ledwina 

0.5 Simulation results 

0.5.1 The gaussian case 
Typical paths under the final value 

This graph illustrates Proposition [H 
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Typical paths of conditioned random walks (gauss) 
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■Typical path under 

S(n)/n=0,232 

■iid twisted sampling 



Figure 1 Gauss estimator vs k 
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Figure 1 Gauss 

The graph shows the role of k in the behavior of the estimate. The JQ's are 
standard normal, n — 100 and P n — 10~ 2 . When k is less than 70 the new esti- 
mate improves on the classical i.i.d. scheme. A change in M leads no significant 
change (here M = 30). The value of L is L = 2000. 



Figure 2 Gauss 

The graph illustrates the accuracy of the asymptotic results in Propositions fT9l 
and[2T1 The X^s are standard normal, n = 100, P„ = 10~ 2 , k = 60. 
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Figure 2 Gauss MSE (theoretical and 
empirical) 
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Figure 3 Gauss 

The graph is an illustration of Proposition [5U The r.v's X^s are standard 
normal, n = 100 and P n = 10~ 2 . In ordinate is the ratio of the empirical value 
of the MSE of the adaptive estimate w.r.t. the empirical MSE of the i.i.d. 
twisted one. The value of k is k — 60; this ratio stabilizes to \Jn — k/y/n for 
large L, in full accordance with Proposition l22l 

0.5.2 The exponential case 
typical paths 

The graphs above are typical paths under the conditional distribution (with 
S„/n = 0.239) and under the i.i.d. sampling with tilted density. The value of n 
is 100 and the approximation of the conditional density of the random walk is 
fair up to k — 80, as indicated by the fact that the IS estimator of P n is correct 
up to k = 80, which can be seen as a pertinent indicator. 

The random variables X^s are i.i.d. with exponential distribution with pa- 
rameter 1 on (—1, oo) . The case treated here is P (— > a„) = P n with n = 100, 
P n = 0.013887 and a n = 0.232. These values are computed through a very long 
run of the standard IS algorithm (with i.i.d. sampling according to the twisted) 
and are used as a benchmark. The estimates are calculated with L — 1000, and 
L = 10000 for k = 0, i.e. for the classical i.i.d. twisted sampler (lower values of 
L lead unstable estimates) 
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Figure 3 Gauss gain ol the adaptive twisted scheme 
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Figure 1 exp estimate vs k 
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0.6 Appendix 

0.6.1 Proof of Proposition [1] 

We first state 

Lemma 23 Denote q* := (X the Lebesgue measure). Then q*(y) — xyp(y) 

Proof. By Theorem 3.4 (2) in Broniatowski and Keziou (2006) it holds q*{y) = 
(ay + f3)p(y) for some constants a and (3. The projection Q* satisfies both 
/ vdQ(v) — x and / dQ(v) = which yield a = x and (3 = 0. m 
For any set A in B(R), it holds 

P (Xi G Aj (S?/n > a n x)) = P[A) + a n xQ*{A) + o (a n ) . (57) 

Indeed it holds 

-\ (P (Xi e A/S n>x ) - P(A)) = -\e (i a (X) - P(A)/S n>x ) 



^(4 (^MX,)-^))/^) 





p 

— OO 



\ nat 



Observe that £ niX = {M„ G Q x } . Also denote (resp A t ) the subset of 
defined through A+ := {Q € M(R) : Q(R) = 0, / l A (v)dQ(v) > i}, resp 
At ■■= {Q G M(M) : Q(M) = 0,/U(u)dQ(«) < i} ■ Using Bayes formula and 
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the above moderate deviation result (TU|) it follows that for any measurable set 
G in M(R) 

T,/, r ^,;,r 1 if Q* belongs to G 

hm Pr (M„ e G/M„ € fi x ) = n it • 
n^oo v ' ; otherwise 

Proof. For any positive (resp. negative) t it then holds lim n ^oo Pr (M„ £ A^ /M n € £1. T ) 
1 if £ < Q*(A) and Q*(A) > (resp lim IWOO Pr (M„ e A t _ /M„ e fi x ) = 1 
if £ > and < ), which is to say going to the limit in n, 

that lim rwoo £ (P (X! e A/£ 1hX ) - P(A)) = f_ Q ,- {A) dt + J Q * +(A) dt where 
Q* = Q* + — Q*~ is the Lebesgue decomposition of Q*. This closes the proof of 
(|57|) . A second order expansion of ir anX (y) in a neighborhood of t = yields 

TT^Cy) = (1 + a„a;t/ + a 2 n x 2 g n {y))p{y). 

Hence for all Borel set A it holds n anX (y)dy = P(A)+a n xQ* (A)+a 2 l x 2 J A g n (y))p(y)dy. 
Since both J A Tr a " x (y)dy tends to P(A) and Q* is a finite measure it follows that 
a l x2 I a 9n(y))p(y)dy tends to 0. ■ 



0.6.2 Two Lemmas pertaining to the partial sum under 
its final value 

We now state two lemmas which describe some functions of the random vector 
X™ conditioned on £ „ . 

Lemma 24 Assume that (A) holds. Then for all i between 1 and k 

n ( s l\ ^ ( 1 

' a n = a n + CAp„ 1 



Proof. Select s in (a n ,b n ) and denote P* the p.m on R™ conditioned on 
(S? = ns) It holds 

Vn — i (m in - a n ) = \Jn-i ( 1+1 - s ) + y/n-i (a n - s) . 

\n-t J 

We prove that for m = n — i 

varps (^Jm ^— sJJ = 0(1) 

as m — > oo where varpsZ denotes the variance of Z conditionally on f^- = . 
Integrating with respect to the distribution of S™ conditioned upon £ n concludes 
the proof. Using ■ 
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_ Ps- (ns-x)p Xl (x) _ 4 2 - {ns-x)n^{x) 
psf (ns) 7r|n (ns) 

with m(t) = s , normalizing both 7r s „ (ns — x) and 7Tg„ (ns) and making use of 
a first order Edgeworth expansion in those expressions yields 

and 



E P s (X 2 ) =s 2 (t)+s 2 + o(£) 



With a similar development for the joint density p„(Xi = x, X 2 = y), using the 
same tilted distribution 7r* it readily follows that 

Eps (XiX 2 ) = s 2 + f - 

Since 

warpaS™ = m(m - 1)E P . (XiX 2 ) + m£ P . (X 2 ) - to 2 £ P s (X : ) 2 

it follows that when m/n tends to 0, then varpsS™ = m(l + o(l)). Since 
m < n — k this amounts to 

n — k 
lim = 0. 

n — >oo ft 

Integration with respect to the distribution of S™ conditioned upon £ n and 
splitting the integcral on (a n ,a n + c„) and (a n + c„, 00), using (C2) concludes 
the proof. 

Remark 25 It can be proved that 

( S m \ 

\fm — a n I => N(0, 1) when m/n — ► 

V m J 

conditionally on (S"/n > a n ) . This result is to be compared with the Gibbs prin- 
ciple for moderate deviations stated in the Introduction which assets that for fixed 
m the joint distribution of (Xi, X m ) conditioned upon E n converges weakly , 
as n — > oo 7 to the joint distribution ofm r.v's XJ, ...,X^ which are independent 
copies of X* . The above result says that even for sequences depending upon n, 
we may replace the original m variables by the m independent tilted ones when 

exploring the behavior of S" 1 under £ n , since y/rnl-^ — a n \ shares the same 
limit distribution. 

We also need the order of magnitude of max (Xi, Xfe) under *p„ which is 
stated in the following result. 
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Lemma 26 It holds for all k between 1 and n 

max(Xi, ...,X fe ) = Oq3„(logn). 

Let s such that na n < s < a n +c n . Denote P* the probability measure of 
X" given the the value of S™ = s. Since 

«p n (max(X 1 ,...,X fc )>*)= / P„ s (max(X 1 ,...,X fe ) >t)p(S? =s/£ n )ds 

we first state the order of magnitude of max(Xi, ...,Xfe) under P* in the next 
Lemma. 

Lemma 27 For all k between 1 and n, max(Xi, ...,Xfc) = Ops (logfc) . 
Proof. Define r := s/n. For all t it holds 

P^(max(Xi,...,X fc )>t) < kP°(X n >t) 

/■oo 
p(X„ = w/Si l = «)p(S? - s/5„)du 

Center and normalize both S™ and S" _1 with respect to the density 7r r in the 
last line above, denoting 7r£ the density of S™ := (S™ — nr) / ' s T %Jn when X has 
density 7r T with mean r and variance s^, we get 

P„ s (max(X 1 ,...,X fe )>i) < k-^= n T (X n = u) 

y/n-l J t 

<T7 (ST 1 = (nr - « - (n - l)r)) / (.s T V^T 



< (S? - o) 

Under the sequence of densities tt t the triangular array (X 1; ...,X„) obeys a 
first order Edgeworth expansion 

P„ s (max(X 1 ,...,X fc ) >t) < k-^L= [°° n T (X n = u) 



du. 



n ((t - u) js T ^n - 1) P (u, i, n) + o(l) 
n(0) + o(l) 

7T T (X„ = tt) dtt. 

for some constant Cst independent of n and r and where 
P (u, i, n) := 1 + P 3 ((t - u) /s T Vn- l) 
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M 2 

where Ps(x) — -^r-j^yrs (a; 3 — 3x) is the third Hermite polynomial; (c' 7 "- 1 ) and 

(r) 

/ig are the second and third centered moments of ir T . We used uniformity 
upon u in the remaining term of the Edgeworth expansions. Let t T such that 
m(t T ) = t. Making use of Chernoff Inequality 

k f°° 

P*(max(Xi,...,X fc ) > t) < -j— exp - (C - t T ) u du 

p{t T ) Jt 

(tr + ) 
0(*r) 



for any A such that <j){t T + A) is finite. 

i/logfc — > co 

it holds 

f5(max(Xi,...,X fc ) <t) -» 1, 

which proves the lemma. ■ 
We now prove Lemma [26] 
As above write 

«p n (max(Xi,...,X fc )>t) < k% l (X n >t) 

< k 



p{S 7 { = S /£ n )d s 

where r is defined as in the above Lemma through r := s/n. Use the same 
argument as in Lemma 1271 to assess that when t/logn goes to infinity then 
the.RHS above tends to 0. This closes the proof. 



0.6.3 Proof of Lemma | 

It holds 



p„(n fe ) = 



P(£n) 

npx^) r°° / n ( t 



t M Ui 



By Lemma [T4l it holds under (CI) 



—3. — n -I- R 



where i?„ := Om„ f > 0. Denote S :=^% . Set 

f b 00 p s (^ = P(S>6) 
/ a °°Ps(«)d« P(S>a) 
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with a := -2-r ( a n - ^) and b := f b n - 5xV It holds 

?t — r£ \ TL J 71 ■ \ 71 1 

p„ (if) = (1 + 7) / " p (if/T = a) p (T = a) da. 

J a n 

Use Lemma ?? to obtain 

P ( S > On + 



where a„ := a„ + Otp n 



/ = 



i — A; 



P(S>a„) 
a n (1 + oqj„(l)) . Use Lemma [3] to obtain 



I = (exp— nc n a n ) ( exp 



2 2 

n — k 



which tends to under (C). 



0.6.4 Proof of Lemma | 

The approximation in (|44[) holds only on 



In the above display, 
By the above definition 



4 



Az x 



P n {x\) 



i)7i — k 



g(zf) 

lim <p„ (A„ £) 



Note also that 



(58) 



> 



J lA„, tn (^)gK)^ = | l^(x*)g(^)cfe? 
(l + o(l)) 



1 



1 + e 
1 



l+e n 



which goes to 1 as n tends to oo, where we have used Proposition [T5] In the 
above displays g (x'n is the density of Xf when X™ is sampled under g. We 
have just proved that the sequence of sets A n ^ En contains roughly all the sample 
paths X" under the importance sampling density g. 
We use the fact that t% defined through 



m(ffc) 



- k 



n 
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is close to a n under *}3„ uniformly upon a in (a n , b n ). 
Let S n tend to and lim IWOO a n 5 n yn — k = oo and 



= |a:?: 


m(t k ) 


- 1 


< o" n j 




a„ 






tkS{t k ) - 


= a n (1 


+ o 


1)) 



We prove that on B n 
holds. 

By Lemma [24] and (C3) 

lim <p„ (B n ) = 1. 
n — >oo 

There exists S' n such that for any x™ in £?„ 



Indeed 





«n 


- 1 






m(i fe ) l 

On 




t k (l + v k ) 
a n 





and limn^oo Vk — 0. Therefore 

Vktk 



1 On < < 1 h 0„ 

^77, CI77, 



Since m ^ fc - ) is bounded so is — and therefore — > as n 

mi. 

Further (RTf]) implies that there exists <5 n " such that 

tks(tk) 



1 



Indeed 



- 1 



t k (1 + u fc ) 



1 



< 6' n + {l + S' n )u k = 6 n " 

where lim^oo Uk — 0. Therefore (|59|) holds. 
Define 

Since 

y i c „ w) g (if) di? > y i Cn pn(a:?)d 

and by ([5Sj) and flBUJl 

lim *p„ (C„) = f 

n — »oo 

we obtain 

lim G(C„) = 1. 

n — >oo 

which concludes the proof. 
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